A procedure for assessing GO annotation consistency

نویسندگان

  • Mary E. Dolan
  • Li Ni
  • Evelyn Camon
  • Judith A. Blake
چکیده

MOTIVATION The Gene Ontology (GO) is widely used to annotate molecular attributes of genes and gene products. Multiple groups undertaking functional annotations of genomes contribute their annotation sets to the GO database resource and these data are subsequently used in comparative functional analysis research. Although GO curators adhere to the same protocols and standards while assigning GO annotations, the specific procedure followed by each annotation group can vary. Since differences in application of annotation standards would dilute the effectiveness of comparative analysis, methods for assessing annotation consistency are essential. The development of methodologies that are broadly applicable for the assessment of GO annotation consistency is an important issue for the comparative genomics community. RESULTS We have developed a methodology for assessing the consistency of GO annotations provided by different annotation groups. The method is completely general and can be applied to compare any two sets of GO annotations. This is the first attempt to assess cross-species GO annotation consistency. Our method compares annotation sets utilizing the hierarchical structure of the GO to compare GO annotations between orthologous gene pairs. The method produces a report on the annotation consistency and inconsistency for each orthologous pair. We present results obtained by comparing GO annotations for mouse and human gene sets. AVAILABILITY The complete current MGI_GOA GO annotation consistency report is available online at http://www.spatial.maine.edu/~mdolan/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining GO Annotations for Improving Annotation Consistency

Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotat...

متن کامل

The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, t...

متن کامل

A Factor Graph Approach to Automated GO Annotation

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifie...

متن کامل

Incorporating Gene Ontology with Conditional-based Clustering to Analyze Gene Expression Data

One of the purposes of the analysis of gene expression data is to cater for the cancer classification and prognosis. Currently, clustering has been introduced as a computational method to assist the analysis. However, these clustering algorithms focus only on statistical similarity and visualization presentation, thus neglecting the biological similarity and the consistency of the annotation in...

متن کامل

Supplemental : Uniclust - clustered and deeply annotated protein sequence databases

As discussed in the main text, the validation may suffer from circularity, as UniProt annotations are dominantly transferred on the basis of sequence similarity. Thereby wrong homology based annotations can overestimate cluster consistency. The evaluation scores we produce can not and should not be interpreted as an absolute value of the quality of clusters, as annotations can suffer from error...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2005